One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control. Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian. However, in many real-world applications, the environment context usually stays stable for a stochastic period and then changes in an abrupt and unpredictable manner within an episode, resulting in a segment structure, which existing works fail to address. To leverage the segment structure of piecewise stable context in real-world applications, in this paper, we propose a \textit{\textbf{Se}gmented \textbf{C}ontext \textbf{B}elief \textbf{A}ugmented \textbf{D}eep~(SeCBAD)} RL method. Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current context segment. The inferred belief context can be leveraged to augment the state, leading to a policy that can adapt to abrupt variations in context. We demonstrate empirically that SeCBAD can infer context segment length accurately and outperform existing methods on a toy grid world environment and Mujuco tasks with piecewise-stable context.
translated by 谷歌翻译
Conversational recommender systems (CRSs) often utilize external knowledge graphs (KGs) to introduce rich semantic information and recommend relevant items through natural language dialogues. However, original KGs employed in existing CRSs are often incomplete and sparse, which limits the reasoning capability in recommendation. Moreover, only few of existing studies exploit the dialogue context to dynamically refine knowledge from KGs for better recommendation. To address the above issues, we propose the Variational Reasoning over Incomplete KGs Conversational Recommender (VRICR). Our key idea is to incorporate the large dialogue corpus naturally accompanied with CRSs to enhance the incomplete KGs; and perform dynamic knowledge reasoning conditioned on the dialogue context. Specifically, we denote the dialogue-specific subgraphs of KGs as latent variables with categorical priors for adaptive knowledge graphs refactor. We propose a variational Bayesian method to approximate posterior distributions over dialogue-specific subgraphs, which not only leverages the dialogue corpus for restructuring missing entity relations but also dynamically selects knowledge based on the dialogue context. Finally, we infuse the dialogue-specific subgraphs to decode the recommendation and responses. We conduct experiments on two benchmark CRSs datasets. Experimental results confirm the effectiveness of our proposed method.
translated by 谷歌翻译
Over the past few years, developing a broad, universal, and general-purpose computer vision system has become a hot topic. A powerful universal system would be capable of solving diverse vision tasks simultaneously without being restricted to a specific problem or a specific data domain, which is of great importance in practical real-world computer vision applications. This study pushes the direction forward by concentrating on the million-scale multi-domain universal object detection problem. The problem is not trivial due to its complicated nature in terms of cross-dataset category label duplication, label conflicts, and the hierarchical taxonomy handling. Moreover, what is the resource-efficient way to utilize emerging large pre-trained vision models for million-scale cross-dataset object detection remains an open challenge. This paper tries to address these challenges by introducing our practices in label handling, hierarchy-aware loss design and resource-efficient model training with a pre-trained large model. Our method is ranked second in the object detection track of Robust Vision Challenge 2022 (RVC 2022). We hope our detailed study would serve as an alternative practice paradigm for similar problems in the community. The code is available at https://github.com/linfeng93/Large-UniDet.
translated by 谷歌翻译
As an important data selection schema, active learning emerges as the essential component when iterating an Artificial Intelligence (AI) model. It becomes even more critical given the dominance of deep neural network based models, which are composed of a large number of parameters and data hungry, in application. Despite its indispensable role for developing AI models, research on active learning is not as intensive as other research directions. In this paper, we present a review of active learning through deep active learning approaches from the following perspectives: 1) technical advancements in active learning, 2) applications of active learning in computer vision, 3) industrial systems leveraging or with potential to leverage active learning for data iteration, 4) current limitations and future research directions. We expect this paper to clarify the significance of active learning in a modern AI model manufacturing process and to bring additional research attention to active learning. By addressing data automation challenges and coping with automated machine learning systems, active learning will facilitate democratization of AI technologies by boosting model production at scale.
translated by 谷歌翻译
由于缺乏低资源语言的语料库,当前的对话生成作品主要集中在英语上。在本文中,我们介绍了MDIA,这是第一个大规模的多语言基准,用于跨低资源语言进行对话生成。它涵盖了19个语言家庭中46种语言的现实生活对话。我们介绍通过微调多语言,非拨号的预训练的模型MT5以及以英语为中心的,以对话为中心的预训练的预训练的聊天机器人对话,获得了基线结果。结果表明,基于MT5的模型在Sacrebleu和Bertscore上的表现更好,但在多样性方面的性能较差。即使在几乎没有射击和零拍的场景中发现了有希望的结果,但英语和其他语言的一代质量之间存在很大的差距。我们希望MDIA的发布可以鼓励更多关于多语言对话生成的作品,以促进语言多样性。
translated by 谷歌翻译
由于互动交通参与者的随机性质和道路结构的复杂性,城市自动驾驶的决策是具有挑战性的。尽管基于强化的学习(RL)决策计划有望处理城市驾驶方案,但它的样本效率低和适应性差。在本文中,我们提出了Scene-Rep Transformer,以通过更好的场景表示编码和顺序预测潜在蒸馏来提高RL决策能力。具体而言,构建了多阶段变压器(MST)编码器,不仅对自我车辆及其邻居之间的相互作用意识进行建模,而且对代理商及其候选路线之间的意图意识。具有自我监督学习目标的连续潜伏变压器(SLT)用于将未来的预测信息提炼成潜在的场景表示,以减少勘探空间并加快训练的速度。基于软演员批评的最终决策模块(SAC)将来自场景rep变压器的精制潜在场景表示输入,并输出驾驶动作。该框架在五个挑战性的模拟城市场景中得到了验证,其性能通过成功率,安全性和效率方面的数据效率和性能的大幅度提高来定量表现出来。定性结果表明,我们的框架能够提取邻居代理人的意图,以帮助做出决策并提供更多多元化的驾驶行为。
translated by 谷歌翻译
本文提出了一个新型的深度学习框架,用于多模式运动预测。该框架由三个部分组成:经常性神经网络,以处理目标代理的运动过程,卷积神经网络处理栅格化环境表示以及一种基于距离的注意机制,以处理不同代理之间的相互作用。我们在大规模的真实驾驶数据集,Waymo Open Motion数据集上验证了所提出的框架,并将其性能与标准测试基准上的其他方法进行比较。定性结果表明,我们的模型给出的预测轨迹是准确,多样的,并且根据道路结构。标准基准测试的定量结果表明,我们的模型在预测准确性和其他评估指标方面优于其他基线方法。拟议的框架是2021 Waymo Open DataSet运动预测挑战的第二名。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
神经量渲染能够在自由观看中的人类表演者的照片真实效果图,这是沉浸式VR/AR应用中的关键任务。但是,这种做法受到渲染过程中高计算成本的严重限制。为了解决这个问题,我们提出了紫外线量,这是一种新方法,可以实时呈现人类表演者的可编辑免费视频视频。它将高频(即非平滑)的外观与3D体积分开,并将其编码为2D神经纹理堆栈(NTS)。光滑的紫外线量允许更小且较浅的神经网络获得3D的密度和纹理坐标,同时在2D NT中捕获详细的外观。为了编辑性,参数化的人类模型与平滑纹理坐标之间的映射使我们可以更好地对新型姿势和形状进行更好的概括。此外,NTS的使用启用了有趣的应用程序,例如重新启动。关于CMU Panoptic,ZJU MOCAP和H36M数据集的广泛实验表明,我们的模型平均可以在30fps中呈现960 * 540张图像,并具有可比的照片现实主义与先进方法。该项目和补充材料可从https://github.com/fanegg/uv-volumes获得。
translated by 谷歌翻译
基于细粒的草图的图像检索(FG-SBIR)解决了在给定查询草图中检索特定照片的问题。然而,它的广泛适用性受到大多数人为大多数人绘制完整草图的事实的限制,并且绘图过程经常需要时间。在这项研究中,我们的目标是用最少数量的笔划检索目标照片(不完整草图),命名为vs-the-fry fg-sbir(bhunia等人.2020),它一旦尽快开始检索每个行程绘图开始。我们认为每张照片的草图绘图集中的这些不完整草图之间存在显着相关性。为了了解照片和ITS不完整的草图之间共享的更高效的联合嵌入空间,我们提出了一个多粒度关联学习框架,进一步优化了所有不完整草图的嵌入空间。具体地,基于草图的完整性,我们可以将完整的草图插曲分为几个阶段,每个阶段对应于简单的线性映射层。此外,我们的框架指导了当前草图的矢量空间表示,以近似速写,以实现草图的检索性能,以利用更多的笔触来接近草图的草图。在实验中,我们提出了更现实的挑战,我们的方法在两个公开的细粒草图检索数据集上实现了最先进的方法和替代基线的卓越的早期检索效率。
translated by 谷歌翻译